A hybrid method for clause splitting in unrestricted English texts

نویسنده

  • Constantin Orăsan
چکیده

It is important to know the structure of the sentence for many NLP tasks. In this paper we propose a hybrid method for clause splitting in unrestricted English texts which requires less human work than existing approaches. The results of a machine learning algorithm, trained on an annotated corpus, are processed by a shallow rule-based module in order to improve the accuracy of the method. The evaluation of the results showed that the machine learning algorithm is useful for identification of clause’s boundaries and the rule-based module improves the results. Using some very simple rules we can report precision of around

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multilingual Method for Clause Splitting

This paper addresses the clause splitting problem and proposes a multilingual method for detecting clause boundaries in unrestricted texts. The method combines language independent machine learning techniques with language specific rules in order to take the first step in building the hierarchical structure of sentences. The results of a machine learning algorithm, trained on an annotated corpu...

متن کامل

Three-Phase Modeling of Dynamic Kill in Gas-Condensate Well Using Advection Upstream Splitting Method Hybrid Scheme

Understanding and modeling of three-phase transient flow in gas-condensate wells play a vital role in designing and optimizing dynamic kill procedure of each well that needs to capture the discontinuities in density, geometry, and velocity of phases but also the effect of temperature on such parameters. In this study, two-phase Advection-Upstream-Splitting-Method (AUSMV) hybrid scheme is extend...

متن کامل

Aligning and Matching of English-Chinese Bilingual Texts of CNS News

This paper presents a project to align and match English-Chinese bilingual news files downloaded from China News Service’s website. The work involves the alignment of bilingual texts at the sentence and clause levels. It addition, the work also requires matching of files as the English and Chinese news files downloaded from the web do not come in the same sequential order. These news files have...

متن کامل

A Surface-Based Approach To Identifying Discourse Markers And Elementary Textual Units In Unrestricted Texts

I present a surface-based algorithm that employs knowledge of cue phrase usages in order to determine automatically clause boundaries and discourse markers in unrestricted natural language texts. The knowledge was derived from a comprehensive corpus analysis.

متن کامل

The Effect of User-Friendly Texts vs. Impersonal and Hybrid Texts on the Reading Comprehension Ability of Iranian EFL Learners

     This study focuses on the effect of user-friendly, impersonal, and hybrid texts on the reading comprehension ability of Iranian foreign language learners. Forty-five students of AlzahraUniversity were selected on the basis of their performance in a recent TOEFL. They were given three different texts (each group of 15 students was given one type) describing the same area of English usage, w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000